Simulation-Based Optimization of Markov Reward Processes: Implementation Issues
نویسندگان
چکیده
We consider discrete time, finite state space Markov rewaxd processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the average reward. The resulting algorithms converge with probability 1, but may have a high variance. Here we propose two approaches to reduce the variance, which however introduce a new bias into the update direction. We report numerical results which indicate that the resulting algorithms are robust with respect to a small bias.
منابع مشابه
Simulation-based optimization of Markov decision processes: An empirical process theory approach
We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted...
متن کاملSimulation - Based Optimization of Markov
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process that depends on a set of parameters. As a special case, the method applies to Markov Decision Processes where optimization takes place within a parametrized set of policies. The algorithm involves the simulation of a single sample path, and can be implemented on-line. A convergence result (with ...
متن کاملA State Aggregation Approach to Singularly Perturbed Markov Reward Processes
In this paper, we propose a single sample path based algorithm with state aggregation to optimize the average rewards of singularly perturbed Markov reward processes (SPMRPs) with a large scale state spaces. It is assumed that such a reward process depend on a set of parameters. Differing from the other kinds of Markov chain, SPMRPs have their own hierarchical structure. Based on this special s...
متن کاملCOVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS
Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...
متن کامل